METHOD AND APPARATUS FOR MULTI-RATE ENCODING OF VIDEO SEQUENCES 
FIELD OF THE INVENTION 

[0001] The present invention pertains to the encoding of information. More particularly, 
the present invention relates to a method and apparatus for multi-rate encoding of video 
sequences. 

BACKGROUND OF THE INVENTION 

[0002] Different communications media have different bandwidth capability. A signal 
needing to be sent may exceed the bandwidth of a particular medium. One method to 
reduce the bandwidth is to encode the signal. However, the signal may need to be sent 
through various media having different bandwidth capability. Thus, encoding the signal at a 
single bit rate may pose problems. For example, encoding a video sequence to have the 
highest quality for a given digital subscriber line (xDSL), may have too high a bit rate for 
transmission through a 56K modem line. 

[0003] Numerous approaches to doing multi-rate encoding have been tried. One obvious 
approach is to encode the data, such as a video sequence, at multiple different bit rates by 
using an encoder and running the video sequence through the encoder as many times as 
there are bit streams to generate, each time adjusting the encoder parameters so that the 
output data has the proper bit rate and/or quality level. Another obvious, brute force 
approach, is to have a group of encoders in parallel, each running at different bit rates as is 
required. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

[0004] The present invention is illustrated by way of example and not limitation in the 
figures of the accompanying drawings, in which like references indicate similar elements 
and in which: 

[0005] Figure 1 illustrates a networked computer environment; 
[0006] Figure 2 is a block diagram of a computer system; 

[0007] Figure 3 illustrates in block diagram form one embodiment of a multi-rate encoder; 
[0008] Figure 4 illustrates a time sequence for one embodiment of a multi-rate encoder; 
[0009] Figure 5 illustrates one embodiment for an encoder for the first stream; 
[001 0] Figure 6 illustrates one embodiment for an encoder for subsequent streams; 
[001 1 ] Figure 7 illustrates one embodiment for motion compensation; and 
[0012] Figure 8 illustrates a prior art decoder for streams. 
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DETAILED DESCRIPTION 

[0013] A method and apparatus for multi-rate encoding of information are described. 
[0014] For purposes of discussing the invention, it is to be understood that various terms 
are used by those knowledgeable in the art to describe techniques and approaches. 
[0015] In the following description, for purposes of explanation, numerous specific details 
are set forth in order to provide a thorough understanding of the present invention. It will be 
evident, however, to one skilled in the art that the present invention may be practiced 
without these specific details. In some instances, well-known structures and devices are 
shown in block diagram form, rather than in detail, in order to avoid obscuring the present 
invention. These embodiments are described in sufficient detail to enable those skilled in 
the art to practice the invention, and it is to be understood that other embodiments may be 
utilized and that logical, mechanical, electrical, and other changes may be made without 
departing from the scope of the present invention. 

[0016] Some portions of the detailed descriptions that follow are presented in terms of 
algorithms and symbolic representations of operations on data bits within a computer 
memory. These algorithmic descriptions and representations are the means used by those 
skilled in the data processing arts to most effectively convey the substance of their work to 
others skilled in the art. An algorithm is here, and generally, conceived to be a self- 
consistent sequence of acts leading to a desired result. The acts are those requiring 
physical manipulations of physical quantities. Usually, though not necessarily, these 
quantities take the form of electrical or magnetic signals capable of being stored, 
transferred, combined, compared, and otherwise manipulated. It has proven convenient at 
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times, principally for reasons of common usage, to refer to these signals as bits, values, 
elements, symbols, characters, terms, numbers, or the like. 

[0017] It should be borne in mind, however, that all of these and similar terms are to be 
associated with the appropriate physical quantities and are merely convenient labels 
applied to these quantities. Unless specifically stated otherwise as apparent from the 
following discussion, it is appreciated that throughout the description, discussions utilizing 
terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" 
or the like, refer to the action and processes of a computer system, or similar electronic 
computing device, that manipulates and transforms data represented as physical 
(electronic) quantities within the computer system's registers and memories into other data 
similarly represented as physical quantities within the computer system memories or 
registers or other such information storage, transmission or display devices. 
[0018] The present invention can be implemented by an apparatus for performing the 
operations herein. This apparatus may be specially constructed for the required purposes, 
or it may comprise a general-purpose computer, selectively activated or reconfigured by a 
computer program stored in the computer. Such a computer program may be stored in a 
computer readable storage medium, such as, but not limited to, any type of disk including 
floppy disks, optical disks, compact disk- read only memories (CD-ROMs), and magnetic- 
optical disks, read-only memories (ROMs), random access memories (RAMs), electrically 
programmable read-only memories (EPROM)s, electrically erasable programmable read- 
only memories (EEPROMs), magnetic or optical cards, or any type of media suitable for 
storing electronic instructions, and each coupled to a computer system bus. 
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[0019] The algorithms and displays presented herein are not inherently related to any 
particular computer or other apparatus. Various general purpose systems may be used 
with programs in accordance with the teachings herein, or it may prove convenient to 
construct more specialized apparatus to perform the required method. For example, any of 
the methods according to the present invention can be implemented in hard-wired circuitry, 
by programming a general-purpose processor or by any combination of hardware and 
software. One of skill in the art will immediately appreciate that the invention can be 
practiced with computer system configurations other than those described below, including 
hand-held devices, multiprocessor systems, microprocessor-based or programmable 
consumer electronics, digital signal processing (DSP) devices, network PCs, 
minicomputers, mainframe computers, and the like. The invention can also be practiced in 
distributed computing environments where tasks are performed by remote processing 
devices that are linked through a communications network. The required structure for a 
variety of these systems will appear from the description below. 
[0020] The methods of the invention may be implemented using computer software. If 
written in a programming language conforming to a recognized standard, sequences of 
instructions designed to implement the methods can be compiled for execution on a variety 
of hardware platforms and for interface to a variety of operating systems. In addition, the 
present invention is not described with reference to any particular programming language. 
It will be appreciated that a variety of programming languages may be used to implement 
the teachings of the invention as described herein. Furthermore, it is common in the art to 
speak of software, in one form or another (e.g., program, procedure, application...), as 
taking an action or causing a result. Such expressions are merely a shorthand way of 
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saying that execution of the software by a computer causes the processor of the computer 
to perform an action or produce a result. 

[0021 ] It is to be understood that various terms and techniques are used by those 
knowledgeable in the art to describe communications, protocols, applications, 
implementations, mechanisms, etc. One such technique is the description of an 
implementation of a technique in terms of an algorithm or mathematical expression. That 
is, while the technique may be, for example, implemented as executing code on a 
computer, the expression of that technique may be more aptly and succinctly conveyed and 
communicated as a formula, algorithm, or mathematical expression. Thus, one skilled in 
the art would recognize a block denoting A+B=C as an additive function whose 
implementation in hardware and/or software would take two inputs (A and B) and produce a 
summation output (C). Thus, the use of formula, algorithm, or mathematical expression as 
descriptions is to be understood as having a physical embodiment in at least hardware 
and/or software (such as a computer system in which the techniques of the present 
invention may be practiced as well as implemented as an embodiment). 
[0022] A machine-readable medium is understood to include any mechanism for storing or 
transmitting information in a form readable by a machine (e.g., a computer). For example, a 
machine-readable medium includes read only memory (ROM); random access memory (RAM); 
magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, 
acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital 
signals, etc.); etc. 

[0023] Figure 1 illustrates a network environment in which the techniques described may be 
applied. As shown, several computer systems in the form of M servers 104-1 through 104-M 

Patent Application 8 Docket 42390P 1 1262 



and N clients 108-1 through 108-N are connected to each other via a network, which may be, for 
example, the Internet. Note that alternatively the network 102 might be or include one or more 
of: a Local Area Network (LAN), Wide Area Network (WAN), satellite link, fiber network, cable 
network, or a combination of these and/or others. The method and apparatus described herein 
may be applied to essentially any type of communicating means or device whether local or 
remote, such as a LAN, a WAN, a system bus, a disk drive, storage, etc. 
[0024] Figure 2 illustrates a conventional personal computer in block diagram form, which may 
be representative of any of the clients and servers shown in Figure 1 . The block diagram is a 
high level conceptual representation and may be implemented in a variety of ways and by 
various architectures. Bus system 202 interconnects a Central Processing Unit (CPU) 204, 
Read Only Memory (ROM) 206, Random Access Memory (RAM) 208, storage 210, display 220, 
audio, 222, keyboard 224, pointer 226, miscellaneous input/output (I/O) devices 228, and 
communications 230. The bus system 202 may be for example, one or more of such buses as a 
system bus, Peripheral Component Interconnect (PCI), Advanced Graphics Port (AGP), Small 
Computer System Interface (SCSI), Institute of Electrical and Electronics Engineers (IEEE) 
standard number 1394 (FireWire), etc. The CPU 204 may be a single, multiple, or even a 
distributed computing resource. The ROM 206 may be any type of non-volatile memory, which 
may be programmable such as, mask programmable, flash, etc. RAM 208 may be, for example, 
static, dynamic, synchronous, asynchronous, or any combination. Storage 210, may be 
Compact Disc (CD), Digital Versatile Disk (DVD), hard disks (HD), optical disks, tape, flash, 
memory sticks, video recorders, etc. Display 220 might be, for example, a Cathode Ray Tube 
(CRT), Liquid Crystal Display (LCD), a projection system, Television (TV), etc. Audio 222 may 
be a monophonic, stereo, three dimensional sound card, etc. The keyboard 224 may be a 
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keyboard, a musical keyboard, a keypad, a series of switches, etc. The pointer 226, may be, for 
example, a mouse, a touchpad, a trackball, joystick, etc. I/O devices 228, might be a voice 
command input device, a thumbprint input device, a smart card slot, a Personal Computer Card 
(PC Card) interface, virtual reality accessories, etc., which may optionally connect via an 
input/output port 229 to other devices or systems. An example of a miscellaneous I/O device 
228 would be a Musical Instrument Digital Interface (MIDI) card with the I/O port 229 connecting 
to the musical instrument(s). Communications device 230 might be, for example, an Ethernet 
adapter for local area network (LAN) connections, a satellite connection, a settop box adapter, a 
Digital Subscriber Line (xDSL) adapter, a wireless modem, a conventional telephone modem, a 
direct telephone connection, a Hybrid-Fiber Coax (HFC) connection, cable modem, etc. The 
external connection port 232 may provide for any interconnection, as needed, between a remote 
device and the bus system 202 through the communications device 230. For example, the 
communications device 230 might be an Ethernet adapter, which is connected via the 
connection port 232 to, for example, an external DSL modem. Note that depending upon the 
actual implementation of a computer system, the computer system may include some, all, more, 
or a rearrangement of components in the block diagram. For example, a thin client might consist 
of a wireless hand held device that lacks, for example, a traditional keyboard. Thus, many 
variations on the system of Figure 2 are possible. 

[0025] Referring back to Figure 1 , clients 1 08-1 through 1 08-N are effectively connected 
to web sites, application service providers, search engines, and/or database resources 
represented by servers, such as servers 104-1 through 104-M, via the network 102. The 
web browser and/or other applications are generally running on the clients 108-1 through 
108-N, while information generally resides on the servers 104-1 through 104-M. For ease 
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of explanation, a single client 108-1 will be considered to illustrate one embodiment of the 
present techniques. It will be readily apparent that such techniques can be easily applied 
to multiple clients. 

[0026] A subsystem may be, but is not limited to, one or more of the elements of Figure 
2. For example, Storage 210 may have a subsystem that handles how data is to be stored 
and retrieved. Audio 222 may have a subsystem that handles when to, for example, power 
down speakers. Communications device 230 may, for example, have a subsystem that 
needs to transfer information to the Storage 210 without using the main operating system 
upon receiving a message. 

[0027] Clients 108-1 through 108-N may be connected to receive information from either 
a single server, such as, 104-1 , and/or a series of servers, such as 104-1 through 104-M. 
Because of the variety of connection possibilities, each connection may have a different 
bandwidth. Under such a circumstance, it is advisable to match the transmission bit rate to 
the bandwidth so as not to overload or delay transmission of information. Thus, encoding 
of the information at different bit rates is beneficial. In the case of real-time transmissions, 
such as video, the matching of encoding bit rate to channel bandwidth will allow for the 
highest quality real-time display of the video information without gaps, pauses, or freezes. 
A multi-rate encoder may be located at the servers (104-1 through 104-M) and/or as part of 
the network 102 serving clients (108-1 through 108-N). Additionally, the originating source 
of the information may do the multi-rate encoding and simply make it available to, for 
example, servers (104-1 through 104-M). What is to be appreciated is multi-rate encoding 
may provide for clients having a very high bandwidth, such as a DSL, to a very small 
bandwidth client such as a wireless link for a pager. 
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[0028] Figure 3 illustrates in block diagram form one embodiment of a multi-rate encoder 
300. The input stream of data at bit rate J 302 enters the encoder 304, and produces a 
variety of output streams at various output bit rates (306-1 through 306-K). 
[0029] Figure 4 illustrates a time sequence for one embodiment of a multi-rate encoder. 
Notationally, f k represents a frame (f) where the subscript n refers to the frame number 
and the superscript k refers to a bit rate. In this embodiment, a given frame is encoded for 
each of the respective bit rates before the next frame is encoded. For example, frame 1 
(f, 1 ) 402-1 is encoded with a bit rate 1 before frame 2, f\ 404-1 is encoded. Also, frame 1 
for all the bit rates (1,2, ... K) f,' 402-1, f, 2 402-2,... f k 402-K, are encoded before the next 
respective frame for a given bit rate. Thus, f, 1 at 402-1 , f, 2 402-2, and f k 402-K, may all be 
encoded before f| at 404-1, f 2 2 404-2, and f k 404-K. 

[0030] Thus an embodiment of the invention may encode a video sequence {/„ } in K 
independent streams at K different bit rates and/or video quality simultaneously. The process 
by which this is accomplished will be described by assuming that the first n-1 frames, /, , f 2 , 
/„_,, have been encoded. Simultaneous encoding as used in this description means that 
the encoding process for the n th frame of all K streams is completed before the encoding 
process for the (n+1 ) th frame is started for any of the encoded streams. 
[0031] Figure 5 illustrates one embodiment for an encoder for the first stream. Encoding of 
the n m frame of the sequence, /„ 502, starts with the computation of its representation in a 

transformed space. Let F„ 506 represent the transformed frame. For MPEG encoding, this 
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transform is the discrete cosine transform, DCT 504, performed on blocks of 8x8 pixels. This 
operation is performed only once on frame /„ 502. 

[0032] From this transformed frame f„ 506, we then subtract 508 its predicted transformed 
representation, F x n 534 (also referred to as transformed predicted data). The index 1 refers 
here to the data associated with the encoding of the first stream. The difference between F„ 
506 and F„' 534, £,| 510, is then quantized, using the quantizer (Q, 512) associated with the first 
stream. The quantizer's output 514, which consists in quantization levels, is then encoded 
without loss (Entropy coder 516) to generate the data noted as (coefficient data) coef. data (1) 
51 8. In the feedback loop of the encoder, the quantization levels representing e\ 510 are then 

mapped to their respective values (£?,'' 520) to generate e\ 522, the quantized representation 
of E x n 510. F„' 534 is then added (524) to £^522 to form f* 526, the transformed representation 
of the reconstructed n th frame of the first stream. F„' 526 is then stored in a frame buffer 528. 
[0033] The computation of the predicted transformed representation of frame Fj 532 can 
be done without any other frame information (like for intra picture (I) frames in Motion 
Pictures Experts Group (MPEG)), or by using motion information relating frame /„ to one 
or many frames previously encoded. These frames can be temporally anterior and/or 
posterior to frame /„ . For simplicity, Figure 5 illustrates only the case when the 

immediately previously reconstructed frame is used. More specifically, in Figure 5, F n '_,530 
is motion compensated (M.C. 532) to generate F„'534, the transformed predicted 
representation of /„ . Note that the motion compensation is performed in the transformed 
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domain. 

[0034] The motion information may be computed (motion estimation M.E. 550) using all 
or some of the following frames: /„ 502, /„_,542, F„506, Fj.,530, and /„'_, (obtained from 

an inverse discrete cosine transform, IDCT of F n '_,530). These frames are the current and 
previously reconstructed frame both in the spatial and transformed domain, as well as the 
preceding original frame. For example, one could compute motion information with pixel 
precision using /„ and /„_, , and refine that motion information to Vz pixel accuracy using 

the transformed frames F„ and F,, 1 ,, . From the motion estimation 550 is output motion 
vector data (M.V. 552) which is used by the motion compensation 532 block and an 
Entropy coder 554 to generate entropy coded motion vector data (ec mv data) 556. 
[0035] Figure 6 illustrates one embodiment for an encoder for subsequent streams. The 
encoding of /„ 502 at other rates or quality levels reuses data that was obtained when 
generating the previous streams. The index i refers here to the data associated with the 
encoding of the i th stream. 

[0036] As with the generation of stream 1 , the predicted frame f; 636 is subtracted 604 from 
F„ 602. The difference, E\ 606, is then quantized (0, 608) and the value of the quantization 
levels 610 are encoded (Entropy coder 612) without loss to generate coef. data (i) 614. The 
reconstructed frame (from Q~ x 616) E[ 618 is then added 620 to Fj 636 to generate the n th 
reconstructed frame of stream /', f; 622. 

[0037] The computation of the predicted frame F„' 636 may differ from that of frame F„' 534 
as shown in Figures 6 and 7. For example, in this embodiment, Figure 7 shows that the 
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predicted frame f; 710 can be generated in three different ways. Note that the sub-system 
used to generate F„' 634, 700 can be changed within the same frame on a block basis. The 
first sub-system used the predicted frame from a previous stream, i.e., fJ 632, 702, for 

example, the previous stage (j = i-1). In the second sub-system, motion information, previously 
computed for stream 1 (j = i-1), is used for the motion compensation of the difference of frames 
f;_, 626, 708, and fJ_ { 628, 706, to which f/632, 702 is then added. Finally, the third sub- 
system is like what is done for stream 1 , i.e. frame f;_, 626, 708 is motion compensated. 
[0038] For the last two sub-systems described above, one may implement the motion 
compensation so that not all transformed coefficients are used. Although this may introduce a 
mismatch error with the decoder, it may be used to speed up the motion compensation. 
[0039] Additionally, it is to be understood that while the above example, for simplicity, used 
the directly previously encoded stage, i.e. j = i-1 , j may represent any previously encoded stage. 
All that is required is j < i. 

[0040] Once the information describing the n m frame for all K streams has been computed as 
described above, the process starts over with the following frames: first, encoding of f n+l for 
stream 1 , followed by the encoding of the same frame, / n+1 , for all other streams, and so on 
until the video sequence has been completed encoded. 

[0041] Figure 7 illustrates one embodiment for motion compensation. It has three 
different ways of doing motion compensation as was mentioned above. First, when f;_, 
708 is connected to 708-1 and f; 710 is connected to 710-1 , then the motion 
compensation (M.C. 720) is derived from the inputs 721 the motion vector data (M.V.) and 
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FjL, 708, while F/702 is not utilized. 

[0042] When 708 is connected to 708-2 and F' n 710 is connected to 710-2, then the 
output f; 710 is derived from the input fJ 702 summed 716 with the output 718 of the 
motion compensation block (M.C.) 730 having the inputs of the motion vector data (M.V.) 
704 and the output 714, from the difference 712 between the inputs F n i, 706 and /?'_, 708. 
[0043] Finally, if F,;_, 708 is connected to 708-3 and F' 710 is connected to 710-3, then 
the output f; 710 is derived solely from the input fJ 702. 
[0044] Figure 8 illustrates a prior art decoder for streams and is provided for 
completeness in understanding how the signals encoded in the present invention may be 
decoded. Input coefficient data 802 (Coef. data(i)) enters an Entropy decoder 804 and the 
output 806 then generates a reconstructed frame 810 (from Q~ x 808). From here the signal 
810 goes through an inverse DCT (IDCT 812) and the output is ^814. The output 814 is 
then summed 816 with /„' 826 to produce the output 818. /„' 826 is produced by the 
motion compensation block (M.C.) 824 which has as inputs 822 coming from Frame 

Buffer 820, and 832 from the Entropy decoder 830 whose input is the entropy coded motion 
vector data (ec mv data) 828. 

[0045] In the above embodiments, functional blocks denoting well known operations, 
such as the discrete cosine transform, quantization, de-quantization, summation, 
difference, entropy coding, frame buffer, motion estimation, etc. have not been detailed in 
order to not obscure the description. It is to be understood however, that these functions 
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are well known in the art and may be implemented in hardware and/or software, either on 
general purpose and/or dedicated computers or microcomputers. For example, frame 
buffer 528, which functions as a time delay element, may be performed on a computer 
running software by storing the data and some time later retrieving it, or in hardware by 
using a standard hardware based frame buffer. 

[0046] It is to be appreciated that the architecture and functionality described above may 
have other embodiments. For example, in Figure 5 the transform block 504 is the discrete 
cosine transform. Other embodiments may use other transforms and/or functions, and their 
inverses, alone or in combinations, such as, wavelet, edgelet, Fourier, Walsh, Hadamard, 
Hartley, Haar, sine, cosine, hyperbolic, convolution, correlation, autocorrelation, 
modulation, decimation, interpolation, etc. as may be beneficial based upon the input signal 
characteristics. 

[0047] Additionally, it is to be appreciated that the present invention may code each of 
the K independent streams at K different bit rates as either a constant bit rate (CBR) and/or 
variable bit rate (VBR) to achieve the system goal of bit rate and/or video quality. Likewise, 
the quantizer and de-de-quantizer referred to may be of either a fixed and/or variable 
resolution. This resolution may be in response to CBR and/or VBR requirements. 
[0048] Finally, one is to appreciate that as previously mentioned, the present invention 
places no temporal restrictions on the input data. Thus, temporally anterior and/or posterior 
encoding may be performed. 

[0049] Thus, a method and apparatus method and apparatus for multi-rate encoding of 
video sequences have been described. Although the present invention has been described 
with reference to specific exemplary embodiments, it will be evident that various 
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modifications and changes may be made to these embodiments without departing from the 
broader spirit and scope of the invention as set forth in the claims. Accordingly, the 
specification and drawings are to be regarded in an illustrative rather than a restrictive 
sense. 
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